/*

This file is part of the replication packet for "A Low-Cost Information Nudge Increases Citizenship Application Rates Among Low-Income Immigrants"

Purpose: This script reads in the SIPP data from 2008 and produces an estimate for the number of immigrants eligible to apply for citizenship using the Federal Fee Waiver. 

Information about the 2008 SIPP and specifically the Panel Wave 02 can be found here: https://www.census.gov/programs-surveys/sipp/data/2008-panel/wave-2.html. 
The Wave 02 data dictionary at that site is especially useful for defining the population of interest.

While working these on scripts, we relied heavily on the scripts from Jean Roth at NBER to read in the SIPP data. 

These excellent files simplify reading in the SIPP data and are copyrighted by NBER and Jean Roth. They can be found online at: http://www.nber.org/sipp/2008/.

The data files used in this script are:

sippp08putm2.dta
sippl08puw2.dta


*/          



/* Process for estimating the percent of eligible LPRs who could use the fee waiver to naturalize

1) Load in the topical wave data

2) Construct an immigration status variable

3) Create a years of residency variable

4) Create a years-needed-to-naturalize variable

5) Assign an eligible-to-naturalize variable based on immigration status, years of residency, and years-needed-to-naturalize

6) Keep only naturalization-eligible LPRs

7) Merge in the core data

8) Calculate a flag for whether a person's income is below 150% of the Federal Poverty Guidelines

9) Calculate a flag for whether a person receives means-tested-benefits

10) Calculate a flag for a person's eligiblity for a fee waiver

11) Calculate the share of LPRs who are eligible to naturalize and eligible for a fee waiver

*/




clear  all


* name the directory where the files are saved
global ppath = ""


* set the working directory
cd "$ppath/"



* topical: sippp08putm2.dta
* ssuid eentaid epppnum
* one row is one person

* core data: sippl08puw2.dta 
* ssuid eentaid epppnum srefm
* every row is person-reference month




****** load in the in topical module which has questions on immigration history and status

* immigration module
use sippp08putm2, clear

* looking at variables that will be used to categorize immigrants into their status type


* EAMGUNV - Universe indicator, All persons 15+ at the end of reference period.
* reference all people 15+ eamgunv
tab eamgunv, nolab



* citzienship
* ecitiznt - US Citizenship Status of Respondent, Is ... a citizen of the United States?
tab ecitiznt

* if citzens, how did you become citizen
* ENATCITT - How the respondent became a US citizen, How is ... a U.S. citizen?
tab enatcitt

* if naturalized or citizen through spouse --> assume you entered as an immigrant
* ask status on entry

* status on entry is aksed of:
* naturalized, citizen through spouse, or noncitziens
* TIMSTAT - Immigration status upon entry to the U.S. When ... moved to the U.S. to live, what was ...'s immigration status?
* U All persons 15+ at the end of reference period who were not born in the U.S. and whose citizenship is not due to adoption, birth in an island area or birth abroad to U.S. citizen parents or who are not citizens.
* (EPOPSTAT=1 AND EPPMIS4=1 AND EBORNUS=2 AND (ENATCITT=1,2,6 OR ECITZNT=2))
tab timstat

* adjusted status
* EADJUST - Whether status has changed to permanent resident
*  Has ...'s status been changed to permanent resident?
tab eadjust


* adjustment of status if asked only of those who say other status on entry and not-citziens now
bys ecitiznt: tab timstat eadjust, mis



* born outside US
tab tbrstate, nolab
gen foreignborn = 0 if tbrstate < 60
replace foreignborn = 1 if tbrstate > 60
replace foreignborn = . if eamgunv == -1


* coding LPR status
* LPR are poeple that are foreign born and permenant resident
* LPR are people that entered as other but have adjusted their status

gen immig_status = ""
replace immig_status = "Born Citizen" if ecitiznt == 1
replace immig_status = "Naturalized Citizen" if ecitiznt == 1 & tbrstate > 60
replace immig_status = "LPR" if tbrstate > 60 & ecitiznt == 2 & timstat == 1
replace immig_status = "Undocumented" if tbrstate > 60 & ecitiznt == 2 & timstat == 2
replace immig_status = "LPR" if immig_status == "Undocumented" & eadjust == 1
replace immig_status = "Not in univerise - eamgunv" if eamgunv == -1


tab immig_status, missing
* estimated that 3,777 LPR remain in sample

 

* getting those eligible to naturalize

* duration of US residence, in topical
tab tmoveus
* calculating year of entry from the question tmoveus

gen     yearofentry = 1961    if tmoveus==1
replace yearofentry = 1964.5  if tmoveus==2
replace yearofentry = 1971    if tmoveus==3
replace yearofentry = 1976    if tmoveus==4
replace yearofentry = 1979.5  if tmoveus==5
replace yearofentry = 1982    if tmoveus==6
replace yearofentry = 1984.5  if tmoveus==7
replace yearofentry = 1987    if tmoveus==8
replace yearofentry = 1989.5  if tmoveus==9
replace yearofentry = 1991.5  if tmoveus==10
replace yearofentry = 1993.5  if tmoveus==11
replace yearofentry = 1995.5  if tmoveus==12
replace yearofentry = 1997.5  if tmoveus==13
replace yearofentry = 1999    if tmoveus==14
replace yearofentry = 2000    if tmoveus==15
replace yearofentry = 2001    if tmoveus==16
replace yearofentry = 2002.5  if tmoveus==17
replace yearofentry = 2004    if tmoveus==18
replace yearofentry = 2005    if tmoveus==19
replace yearofentry = 2006    if tmoveus==20
replace yearofentry = 2007    if tmoveus==21
replace yearofentry = 2008    if tmoveus==22

* calculating year of adjustment from tadyear

gen yearofadjust = .  if tadyear== -1
replace yearofadjust = 1980    if tadyear==1
replace yearofadjust = 1982  if tadyear==2
replace yearofadjust = 1985.5    if tadyear==3
replace yearofadjust = 1987.5  if tadyear==4
replace yearofadjust = 1989.5  if tadyear==5
replace yearofadjust = 1993  if tadyear==6
replace yearofadjust = 1996  if tadyear==7
replace yearofadjust = 1998.5  if tadyear==8
replace yearofadjust = 2000    if tadyear==9
replace yearofadjust = 2001    if tadyear==10
replace yearofadjust = 2002    if tadyear==11
replace yearofadjust = 2003  if tadyear==12
replace yearofadjust = 2004    if tadyear==13
replace yearofadjust = 2005    if tadyear==14
replace yearofadjust = 2006    if tadyear==15
replace yearofadjust = 2007    if tadyear==16
replace yearofadjust = 2008    if tadyear==17

* calculating years of residency for LPRs
gen  residency = 2008-yearofentry
replace residency = 2008 - yearofadjust if yearofadjust != .
gen years_in_country = 2008 - yearofentry


* marital status 
* ems = 1 is "married, spouse present"
tab     ems
gen married = ems==1


* normally naturalization requires five years of residency, but it can be 3 or 4 if the spouse is a citizen
gen residencyrequired = 5 


* errp is Household relationship
* errp = 3 means that the person is the spouse of the reference person

* 

* determining if spouse is a citizen
gen  spouseus = errp==3 & (immig_status=="Naturalized Citizen" | immig_status=="Born Citizen" )
* summing up the spouse is a citizen flag for the entire household
egen spouseus_fam = sum(spouseus) , by(ssuid rfid)

* errp = 1 means  Reference person with related
* determining if the reference person is a US citizen
gen  refus = errp==1 & (immig_status=="Naturalized Citizen" | immig_status=="Born Citizen" )
* summing up the reference person is a citizen for the entire household
egen refus_fam = sum(refus) , by(ssuid rfid)

* replacing the residency years required if the spouse is a US citizen
replace residencyrequired = 3 if  spouseus_fam == 1 & errp==1 
replace residencyrequired = 3 if  refus_fam == 1 & errp==3 
 
bys errp: tab residencyrequired married 


* estimates eligiblity for citizenship depending on the reference years required 
gen     elegible = (residency >= residencyrequired) & residency!=.
replace elegible = . if immig_status!="LPR"

bys residencyrequired: tab residency elegible



* assign for kids

* determining if the parents are eligble to naturalized
gen  parent1eleg = errp==1 & elegible==1
gen  parent2eleg = errp==3 & elegible==1
egen parent1eleg_fam = total(parent1eleg) , by(ssuid rfid)
egen parent2eleg_fam = total(parent2eleg) , by(ssuid rfid)
egen parentbotheleg_fam = rowtotal(parent1eleg_fam parent2eleg_fam)

* determining if the parents are US citizens
gen  parent1citiz = errp == 1 & (immig_status=="Naturalized Citizen" | immig_status=="Born Citizen" )
gen  parent2citiz = errp == 3 & (immig_status=="Naturalized Citizen" | immig_status=="Born Citizen" )
egen parent1cit_fam = total(parent1citiz) , by(ssuid rfid)
egen parent2cit_fam = total(parent2citiz) , by(ssuid rfid)
egen parentbothcit_fam = rowtotal(parent1cit_fam parent2cit_fam)

* determining the residency for the family
gen  parentres = years_in_country if  errp == 1 
replace  parentres = years_in_country if  errp == 3 
egen max_parent_res = max(parentres) , by(ssuid rfid)


* assign eligbility to children with at least one parent that is eligible and whose parents are not citizens
replace elegible = 1 if parentbotheleg_fam >=1 & parentbothcit_fam == 0 & eamgunv == -1
* replaces eligiblity for anyone that was misassigned and already had their own status
replace elegible = . if immig_status=="Born Citizen" | immig_status=="Undocumented"
* removes children that were born when a parent was already present in the US 
gen bornUScit = elegible == 1 & tage < max_parent_res & eamgunv == -1  & errp == 4
replace elegible = . if tage < max_parent_res & eamgunv == -1 & errp == 4
* assign LPR status to those that were made elegible in the child assignment
replace immig_status="LPR" if elegible == 1 &  immig_status=="Not in univerise - eamgunv"



* share of eligible in the LPR population
tab immig_status elegible, row missing
* 68.4 of LPR are eligible to naturalize


* subset to people eligible to naturalize
keep if elegible == 1 & immig_status=="LPR"


* merge in the topical module with the core module to calculate eligibility for the fee waiver. Need to have income and poverty ratio

* SSUID uniquely identifies each initially sampled dwelling  unit
* EENTAID (ENTRY) identifies the address where the person lived at the time  she or he was first interviewed.
* EPPPNUM -  In the 1996 Panel, EPPPNUM uniquely identifies a  person within the sample unit.

 
cd "$ppath/rawdata"
merge 1:m ssuid eentaid epppnum using sippl08puw2.dta  
keep if _merge==3



* keep second reference month only
* there are four reference months in the SIPP data, so there are muliple observations for each person. We are choosing to use the second reference month to determine their use of benefits
* other reference months may give slightly different results
keep if srefm==2

* generating the income to poverty ratio. Someone must be below 150% of the Federal Poverty Guidelines to qualify (or have a means-tested benefit)

* thtotinc - Total household income
* rhpov - Poverty threshold for this household in  this month. 


gen     inctopov = thtotinc / rhpov
replace inctopov = 0 if inctopov<0
replace inctopov = inctopov*100

* generating flag for below 150% 
gen below150 = inctopov <150

* calculating means-tested benefits

* thafdc - Total household public assistance payments 
* thfdstp - Total Household food stamps Received
* rcutyp57 - Medicaid coverage flag

gen mtb_pubassist = thafdc>0 & thafdc!=.
gen mtb_food      = thfdstp>0 & thfdstp!=.
gen mtb_medic     = rcutyp57==1

* flag for any benefits received

egen mtb_tot = rowmax(mtb_*)

* fee wavier eligible if receive mtb or below 150% of income guidelines
egen feewaiver = rowmax(mtb_tot below150)

* percent of LPRs eligible for fee waiver
tab feewaiver, mis




